Goto

Collaborating Authors

 nyi descent


Supplementary Material for Mixture weights optimisation for Alpha-Divergence Variational Inference Kamélia Daudel1,2, Randal Douc3

Neural Information Processing Systems

Assume that p and k are as in (A1). Then, the two following assertions hold. A.3 The case α < 1 for the Power Descent algorithm Let α = 1, η (0,1], κbe such that (α 1)κ 0and let the initial probability measure µ1 M1(T) be such that Ψα(µ1) < . A common way to approximate intractable integrals of the form (16) is to resort to Importance Sampling methods and in that case we are also interested in ensuring that the support of the variational approximation q Q (with q = µk in our case) is included in the support of p. Seeking to solve the Variational Inference optimation problem inf Dα(µK||P) for α < 1 enables this to happen, as opposed to the case α 1 for which the α-divergenve exhibits the so-called mode-seeking property [2, 3, 4]. As a whole, well-chosen samplers and variance reduction methods appear to be a necessity even in the case α = 1 so that the obtained Monte Carlo estimator of θ 7 bµ,α(θ)do not suffer from a too large variance.


Mixture weights optimisation for Alpha-Divergence Variational Inference

Neural Information Processing Systems

This paper focuses on α-divergence minimisation methods for Variational Inference. We consider the case where the posterior density is approximated by a mixture model and we investigate algorithms optimising the mixture weights of this mixture model by α-divergence minimisation, without any information on the underlying distribution of its mixture components parameters. The Power Descent, defined for all α = 1, is one such algorithm and we establish in our work the full proof of its convergence towards the optimal mixture weights when α < 1. Since the α-divergence recovers the widely-used exclusive Kullback-Leibler when α 1, we then extend the Power Descent to the case α = 1 and show that we obtain an Entropic Mirror Descent. This leads us to investigate the link between Power Descent and Entropic Mirror Descent: first-order approximations allow us to introduce the Rényi Descent, a novel algorithm for which we prove an O(1/N) convergence rate. Lastly, we compare numerically the behavior of the unbiased Power Descent and of the biased Rényi Descent and we discuss the potential advantages of one algorithm over the other.


MixtureweightsoptimisationforAlpha-Divergence VariationalInference

Neural Information Processing Systems

The Power Descent, defined for allα = 1, is one such algorithm and we establish in our work the full proof ofits convergence towards the optimal mixture weights whenα < 1.